資料庫之間"字母符號比較"測試

資料庫

暐翰 2018-06-18 00:35:43 ‧ 6559 瀏覽

分享至

今天回答IT邦友問題
MYSQL 搜尋字母符號問題 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

資料庫是mysql,
搜尋žř這樣的特殊文字會被當成zr來處理,
請問有什麼解決辦法呢?
比如有2筆資料分別是žybřidcz和zybridcz,
我下指令
select * from TestTable where text= 'žybřidcz' 或
select * from TestTable where text= 'zybridcz'
結果2筆都會出來.

看到後面一級屠豬士大大使用Postgresql輕鬆解決問題
所以想去比較不同資料庫之間的差異

先準備測試資料

|     text |
|----------|
|     tuấn |
|     tuan |
| žybřidcz |
| zybridcz |

MySQL:

假如只單純使用=，會發現四個值都被查詢出來

select *
  from TestTable
 where text = 'žybřidcz' or text = 'tuan';

|     text |
|----------|
|     tuấn |
|     tuan |
| žybřidcz |
| zybridcz |

需要使用HEX或是INSTR才能篩選掉

select * from TestTable 
where INSTR(text,'tuan') > 0 or INSTR(text,'žybřidcz') > 0;

select * from TestTable 
where HEX('tuan')  = HEX(text) or HEX('žybřidcz')  = HEX(text);
  
|     text |
|----------|
|     tuan |
| žybřidcz |

另外還有大小寫問題，查詢text = 'TUAN'可以查到資料tuan跟tuấn

select * from TestTable where text = 'TUAN';

| text |
|------|
| tuấn |
| tuan |

附上:線上測試連結

SQL Server:

假如只單純使用=不會有問題

select * from TestTable
where text = 'žybřidcz' or text = 'tuan';

|     text |
|----------|
|     tuan |
| žybřidcz |

但是，會有大小寫問題
查詢text = 'TUAN'可以查到資料tuan

select * from TestTable
where text = 'TUAN';

|     text |
|----------|
|     tuan |

可以參考我之前在SO回答的問題
sql server - SQL upper/lower + substring not working well - Stack Overflow
使用COLLATE Latin1_General_CS_AS或是ASCII解決

附上:線上測試連結

Postgresql跟Oracle:

不需要Hex、collate直接=比較就可以，也沒有大小寫問題
附上:Postgresql線上測試連結
附上:Oracle線上測試連結

select * from TestTable
where text = 'žybřidcz' or text = 'tuan';
 
|     text |
|----------|
|     tuan |
| žybřidcz |

總結:

資料庫	特殊字母符號定序問題	字母定序大小寫問題
MySQL	有	有
sql-server	無	有
Postgresql	無	無
Oracle	無	無

假如大大們知道原理，或是有其他看法都可以討論。

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

3 則留言

一級屠豬士

iT邦新手 2 級 ‧ 2018-06-18 17:55:26

很棒的整理比較.補充一下關於大小寫部份資料,在一些情境下,大小寫分別不敏感是比較適合的.
Postgresql提供了一個extension, 增加了新的資料型態,這樣也不用修改系統參數等一連串動作.有極佳的使用彈性.以下提供範例,可以注意到index一樣能夠有作用.

--  case-insensitive text

[local]miku@miku=# create extension citext;
CREATE EXTENSION

--
create table ithelp180618a (
  id serial primary key
, name1 text not null  
);

insert into ithelp180618a (name1) values
('larry'), ('Tom'), ('Damian'), ('NEAL'), ('Bjørn');

select *
  from ithelp180618a
 where name1 = 'Larry';

 id | name1 
----+-------
(0 筆資料列)

----------
-- use the citext

create table ithelp180618b (
  id serial primary key
, name1 citext not null  
);

insert into ithelp180618b (name1)
select name1
  from ithelp180618a;

select *
  from ithelp180618b
 where name1 = 'Larry';

 id | name1 
----+-------
  1 | larry
(1 筆資料列)
  
-------------------
------- index -----

-- generate 10k rows fake data with fake2db ( https://github.com/emirozer/fake2db  )

create index on ithelp180618a ( lower(name1) );

create index on ithelp180618b ( name1 );

ANALYZE ithelp180618a;

ANALYZE ithelp180618b;
--
explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
select *
  from ithelp180618a
 where lower(name1) = lower('Larry');  

                                                                  QUERY PLAN                                
------------------------------------------------------------------------------------------------------------
 Index Scan using ithelp180618a_lower_idx on miku1.ithelp180618a  (cost=0.29..8.30 rows=1 width=18) (actual 
   Output: id, name1
   Index Cond: (lower(ithelp180618a.name1) = 'larry'::text)
   Buffers: shared hit=1 read=2
 Planning time: 0.239 ms
 Execution time: 0.063 ms


explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
select *
  from ithelp180618b
 where name1 = 'Larry';  
 
                                                                  QUERY PLAN                                
------------------------------------------------------------------------------------------------------------
 Index Scan using ithelp180618b_name1_idx on miku1.ithelp180618b  (cost=0.29..8.30 rows=1 width=18) (actual 
   Output: id, name1
   Index Cond: (ithelp180618b.name1 = 'Larry'::citext)
   Buffers: shared hit=1 read=2
 Planning time: 0.159 ms
 Execution time: 0.111 ms

回應 1
檢舉

暐翰 iT邦大師 1 級 ‧ 2018-06-18 21:22:41 檢舉

在一些情境下,大小寫分別不敏感是比較適合的

是，像是模糊查詢時想不區分大小寫時
這時就很方便

謝謝大大補充

登入發表回應

純真的人

iT邦大師 1 級 ‧ 2018-06-18 20:02:51

有查到SQL Server指定大小寫~必須修改資料庫的語言定序才可以~

轉貼~

【Tips】SQL查詢時設定大小寫認定

最近為一個系統要加入使用者帳號認證，在測試時才發現無論key入大寫或是小寫，都可以通過認證進入管理介面，之後找出了問題在於忽略了SQL Server 定序對於大小寫的區分。

寫下這篇文章也是給自己做個筆記 ^_^

 

如果在安裝SQL Server未特別指定資料庫定序的話，預設會是

Chinese_Taiwan_Stroke_CI_AS

其中的 _CI：不分大小寫；_AS：區分腔調字

相反的，如果要區分大小寫的話就必須用

_CS：區分大小寫；_AI：不區分腔調字

 

因此在對應要區分大小寫的使用者帳號、密碼時的SQL語法即可使用：

*/

SELECT left(idno,1) COLLATE Chinese_Taiwan_Stroke_CS_AI x,count(*) cnt 

FROM  dbo.Stud_studentinfo 

group by left(idno,1) COLLATE Chinese_Taiwan_Stroke_CS_AI

order by left(idno,1) COLLATE Chinese_Taiwan_Stroke_CS_AI

 

--SELECT left(idno,1) COLLATE Chinese_Taiwan_Stroke_CS_AI x,count(*) cnt 

select *

FROM  dbo.Stud_studentinfo 

where left(idno,1) COLLATE Chinese_Taiwan_Stroke_CS_AI='張'

/*想知道全部的資料庫定序清單的話，可以在SSMS中使用以下語法：*/

SELECT * FROM fn_helpcollations()

http://blog.xuite.net/hem1331.sk/programe/82348295-SQL%E6%9F%A5%E8%A9%A2%E6%99%82%E8%A8%AD%E5%AE%9A%E5%A4%A7%E5%B0%8F%E5%AF%AB%E8%AA%8D%E5%AE%9A

然後這是MYSQL指定大小的作法~

【Database】[轉載] 讓 MySQL 區分大小寫
原文出處：http://blog.phptw.idv.tw/read-182.html

1. 第一種方法是在查詢語句中欄位名稱前加上關鍵字「BINARY 」

SELECT * FROM tablename WHERE BINARY column = 'value';

2. 建立資料表時使用「BINARY的屬性」

CREATE TABLE `dbname`.`tablename ` ( `columnname` VARCHAR( 10 ) BINARY )

3. 使用 MySQL中用字符集編碼 使用 bin 型態

http://oldgrayduck.blogspot.com/2010/07/database-mysql.html

所以特殊字問題~應該也是要指定才會正常@@

select * from db 
where BINARY text = 'tuan';

http://sqlfiddle.com/#!9/e8f487/23

或者一開始欄位類型就要這樣了~

CREATE TABLE db 
    (`text` varchar(8) BINARY )
;
    
INSERT INTO db 
    (`text`)
VALUES
    ('tuấn'),
    ('tuan'),
    ('žybřidcz'),
    ('zybridcz')
;

select * from db 
where text = 'tuan';

http://sqlfiddle.com/#!9/9b4a2b/1

回應 1
檢舉

暐翰 iT邦大師 1 級 ‧ 2018-06-18 21:15:45 檢舉

第一次學到 mysql BINARY 用法.
謝謝大大補充!

登入發表回應

小碼農米爾

iT邦高手 1 級 ‧ 2018-06-18 21:57:49

分享兩種 MySql 的做法。

第一種純真的人大大上面有提到的 BINARY。

select * from db 
where text = binary 'tuan';

第二種可以改變定序為二進制定序。

select * from db 
where text = 'tuan' collate utf8_bin;

回應
檢舉

登入發表回應

我要留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22200 篇

完賽人數

602 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙